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This mini-review summarizes techniques applied in, and results obtained with, proteomic 
studies of human immunodeficiency virus type 1 (HIV-1 )-T cell interaction. Our group pre- 
viously reported on the use of two-dimensional differential gel electrophoresis (2D-DIGE) 
coupled to matrix assisted laser-desorption time of flight peptide mass fingerprint analysis, 
to study! cell responses upon HIV-1 infection. Only one in three differentially expressed 
proteins could be identified using this experimental setup. Here we report on our latest 
efforts to test models generated by this data set and extend its analysis by using novel 
bioinformatic algorithms. The 2D-DIGE results are compared with other studies including a 
pilot study using one-dimensional peptide separation coupled to MS^, a novel mass spec- 
trometric approach. It can be concluded that although the latter method detects fewer 
proteins, it is much faster and less labor intensive. Last but not least, recent developments 
and remaining challenges in the field of proteomic studies of HIV-1 infection and proteomics 
in general are discussed. 
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INTRODUCTION 

Human immunodeficiency virus type 1 (HIV-1), the causative 
agent of AIDS, uses CD4+ T cells as a host. In order to do so 
efficiently the virus adapts the host cell's intracellular metabolism. 
The host cell, in turn, initiates intracellular antiviral responses 
and signals to the host's immune system (Lever and Jeang, 2011). 
Thus, HIV- 1 infection and the host response trigger many phys- 
iological changes in the infected cell (Gomez and Hope, 2005). 
HIV-1 survives and persists in infected cells preparing them for 
production and release of new viral particles. Intracellular changes 
due to HIV-1 infection have been studied extensively, focusing 
on the contribution of HIV-l's accessory proteins to these pro- 
cesses, using microarrays or serial analysis of gene expression 
(SAGE) to detect mRNA changes in the cell (Van't Wout etal, 
2003; Giri etal, 2006; Roeth and Collins, 2006; Lefebvre etal, 
2011; Wu etal., 2011). Gene expression profiling with microar- 
rays is of course easy to perform, generating large datasets quickly 
(Heller, 2002), but sequences must be known in advance, which 
SAGE does not require. SAGE, based on direct sequencing of 
mRNA tags, also does not use hybridization as microarrays do, 
leading to more reliable probing of mRNA levels. SAGE is cur- 
rently being replaced by high-throughput sequencing technologies 
(RNA-Seq; Baginsky etal, 2010). Proteome changes upon HIV 
infection have also been studied in detail with mass spectrom- 
etry (Coiras etal, 2006; Chan etal, 2007; Ringrose etal, 2008; 



Navare etal, 2012), lately focusing on studies specifically mon- 
itoring direct interactions between viral and cellular proteins 
(Jager et al, 2012a,b). 

Changes in gene expression patterns characterize the cellular 
response to HIV- 1 infection. However, changes in mRNA levels 
are only part of the story. Often stringent correlation between 
mRNA and protein levels is lacking (Pradet-Balade etal., 2001). 
In human cells, transcription seems to explain only 30% of vari- 
ation in protein levels, with translation and protein degradation 
contributing up to 40% (Vogel etal., 2010; Schwanhausser etal., 
2011). In E. coli, relative contributions to regulation of pro- 
tein levels via transcriptional and/or translational control have 
even been shown to vary greatly with the kind of signal the 
cell responds to (Kramer etal., 2010). Direct cellular responses 
are also strongly accompanied by coordinated protein modifica- 
tions. A protein can exist in many different isoforms, each with 
its own specific function, with a relatively limited number of 
genes giving rise to vast amounts of (functionally) distinct proteins 
(Jensen, 2006). This is mostly accomplished by post-translational 
protein modification (PTM). PTMs constitute highly versatile 
systems allowing cells to respond very quickly to both external 
and internal signals, as illustrated by protein phosphorylation 
in signal transduction or metabolic regulation. Of course, such 
PTM responses cannot be detected using DNA/RNA sequencing 
technologies. Thus, proteomic studies using mass spectrometry 
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to detect and quantify differences in protein expression, pro- 
tein isoforms and complexes, as well as PTMs, are essential for 
understanding the complete set of intracellular responses to HIV- 
1 infection. In this way new insights and intervention strategies 
can be developed. 

In a previous study, we used the fluorescence two-dimensional 
differential gel electrophoresis (2D-DIGE) technique for a com- 
parison of uninfected and HIV-1 infected T cells (Ringrose etal., 
2008). This technique starts out with minimal protein labeling 
using cyanine based fluorescent probes recognizing lysine. A sub- 
sequent two-dimensional gel electrophoresis allows the quantifi- 
cation of changes in protein expression by mixing cell extracts 
labeled either with Cy3 or Cy5 and running them on a single gel 
(Unlu et al, 1997; Alban et al, 2003). Next, differentially expressed 
proteins can be identified by peptide mass fingerprinting (PMF) 
using a matrix assisted laser-desorption time of flight (MALDI- 
TOF) mass spectrometer. PMF uses lists of masses of peptides 
("fingerprints") generated by tryptic digestion of proteins for 
their identification. NB: In this approach quantification is based 
on amount of fluorescence and not on ion detection level in a 
mass spectrometer. The study confirmed several HIV-1 effects 
on pathways and cellular processes previously described using 
stable isotope labeling combined with liquid chromatography- 
mass spectrometry (LC-MS; Chan etal., 2007). But there were 
novel findings as well, most importantly the downregulation of 
proteins involved in glycolysis upon full-blown HIV-1 infection, 
presumably part of a complete metabolic rerouting to preserve 
glucose for the pentose phosphate pathway, the source of riboses 
for subsequent viral nucleic acid synthesis (Ringrose etal, 2008). 
However, despite the success of this 2D -DICE PMF approach it 
also comes with some limitations: the technique is very labor 
intensive and about two -thirds of all the differentially expressed 
proteins detected could not be identified using PMF because they 
were not present in sufficient abundance. As we are planning 
to extend our proteomic analysis of HIV- 1 T cell interaction to 
subcellular fractions, a faster method would be preferable. To 
that end we compared the 2D-DIGE PMF with one-dimensional 
separation of peptides using reversed phase LC coupled to MS^ 
(Geromanos et al., 2009) analysis, again using T cells infected with 
HIV-1. In this approach target proteins are digested with trypsin 
(as in the PMF method mentioned above), and resulting pep- 
tides (parent ions in Figure 1) are now identified as coming from 
certain proteins by the mass analysis of their fragments (daugh- 
ter ions in Figure 1), which allows peptide sequencing [in both 
data-dependent modes of acquisition (DDA) and MS^ applica- 
tions described below] as well as protein quantitation by peptide 
signal abundance. 

A PILOT STUDY USING LC-MS^ 

One of the most exciting new developments in proteomic analy- 
ses is the possibility to perform quantitative protein comparisons 
without having to introduce quantifiable labels: label-free pro- 
teomics. Here we report on the use of label-free proteomics 
in a pilot study of T cells (PMl T cell line) infected with 
HIV-1 (LAI isolate). In this setup a novel data- independent alter- 
nate scanning technique (MS^) on a quadrupole time of flight 
(QTOF) instrument is used. In contrast to DDA, making up 



the standard method on various types of instruments used in 
peptide based proteomics, MS^ does not select a single pre- 
cursor ion for fragmentation but rather fragments "all" ions 
present at any given time during chromatographic separation. 
As such, mass spectrometric data are collected (in principle) on 
fragments of "all" ions instead of a subset that is selected for 
fragmentation during DDA analysis. This decreases bias toward 
selecting only highly abundant peptides and eliminates the need 
to measure samples multiple times in order to collect tandem-MS 
data for "all" ions present (Figure 1). In this manner, MS^ 
greatly expands the number of peptides detected using limited 
LC-separation compared to DDA on QTOF type instruments 
(Geromanos etal., 2009). 

A drawback of MS^ is its incompatibility with quantitation 
schemes that make use of an amine reactive isotopic-label and 
specific reporter fragment ions to ascertain protein quantity such 
as iTRAQ (isobaric tag for relative and absolute quantification; 
Wiese et al, 2007). This approach was used very recently for quan- 
titation of early effects of HIV infection (Navare et al., 2012) using 
multi- dimensional separation and an Orbitrap mass spectrome- 
ter (Makarov and Scigelova, 2010) in which 1448 proteins were 
reliably quantified. However, LC-MS^ is well suited for label-free 
quantitation and pilot studies applying it to our model system 
(uninfected PMl T cells vs. cells at the peak of HIV-1 infec- 
tion) are promising. So far we could quantif)^ 358 proteins, with 
at least 16 proteins clearly up- or downregulated (more than 
twofold). Six enzymes involved in glycolysis were identified. Con- 
sistent with our previous observations these were found either 
to be hardly changed or downregulated. Several other proteins 
found to be changed in abundance previously (Ringrose etal., 
2008) were again detected, but whereas, e.g. Stathmin (Q96CE4) 
is downregulated as before, several 14-3-3 proteins are now upreg- 
ulated instead of downregulated (see Discussion). Total numbers 
of identified proteins are obviously lower than in the 2D-DIGE 
approach, but the technique is much faster, and less labor inten- 
sive (days vs. months). Also, as lower amounts of protein are 
needed for analysis, smaller and more reproducible cell cul- 
ture samples can be used. In the future we plan to combine 
this approach with in-line enrichment of phosphopeptides using 
titanium dioxide chromatography (Pinkse etal., 2004, 2011) to 
look at changes in the cellular phosphoproteome upon HIV-1 
infection. In addition, LC-MS^ will be used with cell lines con- 
taining an inducible HIV-1 provirus (Jeeninga etal., 2008). This 
allows a more synchronous induction of virus production com- 
pared to viral infection, increasing the sensitivity of the assay 
such that small biological changes can be detected. This will 
also make it feasible to discriminate between changes induced by 
the initial virus infection and the subsequent stage of new virus 
production. 

FOLLOW-UP RESEARCH USING RNAi-MEDIATED 
KNOCKDOWN OF CELL FACTORS 

Follow-up study on some of the proteins identified in the 2D- 
DIGE study was performed with an RNA interference (RNAi) 
knockdown screen. Protein induction may reflect host defensive 
mechanisms to prevent or restrict virus infection or replication. 
Alternatively, such changes may represent a viral strategy to induce 
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FIGURE 1 I An overview of LC-MS^. Separation of peptides (colored dots) 
on an LC coupled to a QTOF instrument in a data-independent mode of 
acquisition using electrospray ionization (A). In a data-independent mode of 
acquisition the quadrupole (MSI ) continuously allows passage of all ions, in 
contrast to DDA where the quadrupole selects ions for fragmentation based 
on their intensity. Energy in the collision cell is continuously cycled between a 
low and an elevated profile. This generates spectra of all parent (B) and 
daughter ions (D) throughout the LC-MS run in the time of flight analyzer 
(MS2) without bias with respect to their relative intensities. In order to 
reconstruct fragment ion spectra with daughter ions from a single parent 



only, an ion-accounting algorithm compares the retention time profiles of all 
individual parent ions (C) to all individual daughter ions (E) matching them on 
the basis of retention time profile and intensity (G). In this manner the 
algorithm creates a reconstructed daughter ion spectrum matched to a single 
precursor (F) that can be used by proteome search engines to identify 
peptides and link these to proteins. The unbiased data-independent scanning 
mode greatly expands the number of peptides - and thus of proteins - 
detected, compared to the DDA mode of acquisition on QTOF type 
instruments, especially using limited LC separations on complex mixtures to 
increase throughput. Adapted from (Plumb etal., 2006). 



cellular factors facilitating specific steps of the replication cycle 
(co factors). For 76 cellular targets the impact on HIV-1 replica- 
tion was studied upon mRNA knockdown, using short hairpin 
RNA (shRNA) inhibitors from the Mission'^^ library (Moffat 
etal, 2006). For each target gene four to five shRNAs to gener- 
ate stably transduced T cells were used, thus reducing the chance 
of scoring off- target effects. Knockdown of 38 individual mRNA 
targets resulted in decreased virus replication, possibly because 
of suppression of a viral cofactor. Of these, 27 proteins were 
upregulated during HIV-1 infection in our previous 2D-DIGE 
proteomic screen, fitting the cofactor role. For three targets an 
increase in viral replication was observed, raising the possibility 
that a viral restriction factor was hit (unpublished results). 

BIOINFORMATIC ANALYSIS OF 2D-DIGE DATA 

As mentioned above, one of the most severe limitations of the 
2D-DIGE PMF approach lies in the fact that about two-thirds 
of all the differentially expressed proteins detected cannot be 



identified using PMF, as they are not sufficiently abundant. This 
reflects the major challenge in all proteomic studies: identification 
and (relative) quantification of proteins with lower abundancies. 
We detected 1920 spots, of which 15% (288) were differentially 
expressed at 7-10 days post-infection (p.i.; Ringrose etal, 2008). 
Of these 288 differentially expressed protein spots, 182 remain 
to be identified. However, we have some additional informa- 
tion regarding these unidentified protein spots: we know the pi 
and Mw of the protein, i.e. of the specific isoform(s) detected, 
which in most cases represent the most abundant, mature pro- 
tein form(s). We can also surmise what pathways the proteins 
most likely are involved in, based on the results obtained for the 
~100 identified spots. Using this information we are develop- 
ing bioinformatic algorithms to come up with accurate lists of 
candidate differentially expressed proteins upon virus infection. 
Obviously, such candidates have to be confirmed experimentally, 
checked for instance with highly sensitive antibody-based methods 
such as western blotting. We developed a prioritization approach 
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FIGURE 2 I How to identify candidate proteins based on pi and Mw. For details see text. 



consisting of five steps (Figure 2). First, for the PMF-identified 
proteins the theoretical pJ and Mw are computed for the mature 
form using "Compute pI/Mw"^ Second, two non-Unear mod- 
els are fitted to predict pi and Mw, respectively. Parameters of 
the model are estimated from the ^//-coordinates of the identi- 
fied spots and the pi and Mw determined in the previous step. 
Next, we use these models to predict pi and Mw for unidenti- 
fied spots. Third, for each unidentified spot a list of candidate 
proteins is determined using the estimated pi and Mw as input 
for Tagldent^. Tagldent requires specifying a window size for the 
estimated pi and Mw. Because of uncertainty in the pi and Mw 
values, we choose relatively large windows so that the candidate 
list is likely to include the correct protein. However, candidate 
lists thus often contain hundreds of proteins. This is addressed 
in the last two steps, using the principle of "guilt by association": 
more likely candidate proteins share more features with already 
identified proteins, e.g., being present in the same pathway. In 
the fourth step of our algorithm the physical and/or functional 
interactions of the ~100 identified proteins are extracted from 
the Search Tool for the Retrieval of Interacting Genes (STRING) 
database (Szklarczyk etal, 2011). STRING covers co-occurrence 
in pathways, physical protein-protein interactions, co-occurrence 
in the abstracts of scientific reports, etc., and provides confidence 
scores for strengths of the associations. In step 5, candidate pro- 
teins are ranked using confidence scores by summing weighted 



^http://web.expasy.org/compute_pi/ 
^http://web.expasy.org/tagident/ 



interactions with our "identified protein" STRING set. By combin- 
ing 2D-DIGE with this kind of bioinformatic algorithm it would 
become one of the first techniques able to identify differential 
proteins in the lower regions of the dynamic range. 

DISCUSSION 

Comparing proteomic studies that address HIV-T cell interac- 
tion, it is observed that the various approaches yield a wide range 
in reported numbers of quantifiable proteins, ranging from 3255 
(Chan et al, 2007) and 1448 (Navare et al, 2012) in total quantifi- 
able proteins for techniques using multi- dimensional separation of 
peptides to 92 differentially expressed proteins (out of 1,920 spots) 
in a 2D-DIGE approach (Ringrose etal., 2008). In contrast, the 
numbers of differentially expressed proteins are somewhat com- 
parable at various times p.i. Ringrose et al. (2008) reported 9 (42 h 
p.i.) and 92 (7-10 days p.i.) regulated proteins upon infection, 
while the group of Katze detected 687 (36 h p.i.) changed proteins 
(Chan etal, 2007) and found 266 (4 h p.i.), 60 (8 h p.i.), and 
22 (20 h p.i.) proteins differentially expressed earlier on in infec- 
tion (Navare etal, 2012). Although numbers can of course vary 
according to statistical significance settings, only a small subset of 
proteins is usually identified by all methods (Fahey etal., 2011). 
Each experimental setup yields a considerable group of proteins 
that are not scored by the other methods. This complementarity 
can be explained not only by differences in detection and quantifi- 
cation methods, but frequently by differences in the experimental 
biological systems as well. Variations include: host cell type and 
virus isolate (e.g., with a different receptor use and cell tropism). 
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and timepoint of sampling. Especially the impact of this latter vari- 
able should not be underestimated. It can even determine whether 
differentially expressed protein(form)s are found to be up- or 
downregulated in the interaction between HIV-1 and its host. 14- 
3-3 protein epsilon (P62258) is a case in point: it is upregulated 
after 4 h p.i. and downregulated later on at 7-10 days p.i. (Ringrose 
et al., 2008). Three other 14-3-3 proteins now seem to follow suit: 
tau/theta (P27348), gamma (P61981), and zeta/delta (P63104) 
are found to be upregulated in our latest MS^ experiments, but 
were clearly found to be downregulated in (Ringrose etal., 2008). 
These proteins can be phosphorylated on serine as well as thre- 
onine, e.g., influencing their migration on 2D, and consequently 
pattern and abundance changes can be difficult to interpret. As 
mentioned, an interesting recent study to look at the earliest events 
in HIV infection characterizing the host response at the protein 
level in CD4+ SUP-Tl cells 4, 8, and 20 h p.i. using HIV-1 strain 
LAI, was performed by Navare etal. (2012). Comparison of this 
study to Ringrose etal. (2008), again shows the virus-cell inter- 
action being highly dynamic. Just a few examples: high-mobility 
group box 1 (P09429) and cofilin (P23528) are strongly upregu- 
lated 4 h p.i., return to "normal" 4 and 16 h later, while at 7-10 
days p.i. both are downregulated; glucose-6-phosphate isomerase 
(P06744) is upregulated at 8 h p.i., but at 7-10 days p.i. is found 
to be downregulated. 

Given all this complexity, it is safe to say that proteomic studies 
on HIV-1 in general, and on HIV-T cell interaction in particu- 
lar, will continue to generate new insights. But it will not be easy 
to translate these snapshot datasets into a comprehensive mecha- 
nistic understanding of all interactions involved (Haarburger and 
Pillay, 2010; Zhang etal., 2010). As mentioned, another problem 
of proteomic studies is identification and quantification of pro- 
teins with lower abundancies. One of the possible solutions to do 
this in a relatively unbiased fashion is sampling a proteome via 
interaction with random hexameric peptides using Arg, Lys, His, 
Phe, Tyr, Trp, Leu, and Val only: proteominer beads (Boschetti and 
Righetti, 2008). Samples still seem to be dominated by the most 
abundant proteins, however. "Looking at less to see more" might 
be the better way ahead. Such focusing could be on the analysis of 
specific cellular fractions (e.g. mitochondrial preparations), or on 
the selective study of specific classes of protein or PTMs, such as 
the phosphoproteome mentioned above. 

Another example of zooming in on specific protein subsets 
is the use of methods to enrich for cellular factors that directly 
interact with HIV- 1 proteins. Exciting results have been obtained 



with such "interactome proteomics" methods. The most general 
approach was performed with tagged versions of all 18 HIV-1 
(poly) proteins. The accessory factors Vif, Vpu, Vpr, and Nef, 
Tat and Rev, as well as the polyproteins Gag, Pol, and Gpl60, 
and their processed products (MA, CA, NC, and p6; PR, RT, 
and IN; Gpl20 and Gp41, respectively) were used as bait. Inter- 
acting proteins were subjected to proteomic analysis by tryptic 
digestion followed by LC-MS/MS, again using an Orbitrap (Jager 
etal., 2012a). Interactomics for individual HIV-1 proteins have 
also been reported. Vif interactomics revealed how Vif targets the 
antiviral APOBEC3G protein for degradation via CBF-P, using 
the method just described (Jager etal., 2012a,b). The Rev pro- 
tein was used as bait to fish for partners in HeLa cell extracts, 
which were then analyzed by MudPIT (Multidimensional Protein 
Identification Technology) LC-MS/MS (Naji etal, 2012). "Indi- 
rect" interactomics has also been performed by expressing either 
wild-type Vpu or Vpu that is unable to associate with F-box pro- 
tein P-TrCP in HeLa cells. Without this interaction Vpu cannot 
target certain cellular proteins for degradation by the protea- 
some. Potential targets of Vpu were then identified by quantitative 
proteomics using SILAC (stable isotope labeling by amino acids 
in cell culture) followed by LC-MS/MS (Douglas etal, 2009). 
Much attention has been focused on the multiple roles of the CA 
(capsid) protein (Mascarenhas and Musier- Forsyth, 2009) and the 
Tat protein (Sobhian et al, 2010). 

In many cases results of proteomic studies were compared 
with the results of stable RNAi-knockdown experiments. Global 
approaches to identify host cofactors usually consist of screen- 
ing for reduced viral replication upon RNAi knockdown, or 
enhanced replication in case a cellular restriction factor is hit 
(Zhou etal., 2008; An and Winkler, 2010). Such genome wide 
RNAi screens can easily lead to both false positives (by off-target 
effects) and false negatives (by inefficient knockdown). These con- 
siderations emphasize the importance of performing a concerted 
multi- disciplinary experimental approach, including gene expres- 
sion (transcriptome analysis), RNAi and proteomic studies using 
different detection and labeling techniques. At the same time, such 
a wider survey will generate more and larger datasets in formats 
which are not at all easily compared: an enormous future challenge 
for bioinformaticians. In light of this we also stress the importance 
of specific, non "omic," hypothesis driven follow-up research: not 
generating large new datasets but asking highly specific questions. 
The answers might just make integrating these large datasets a 
lot easier. 
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